{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download and Clean Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's start by importing the <em>pandas</em> and the Numpy libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will be using the dataset provided in the assignment\n",
    "\n",
    "<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>\n",
    "\n",
    "<strong>1. Cement</strong>\n",
    "\n",
    "<strong>2. Blast Furnace Slag</strong>\n",
    "\n",
    "<strong>3. Fly Ash</strong>\n",
    "\n",
    "<strong>4. Water</strong>\n",
    "\n",
    "<strong>5. Superplasticizer</strong>\n",
    "\n",
    "<strong>6. Coarse Aggregate</strong>\n",
    "\n",
    "<strong>7. Fine Aggregate</strong>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's read the dataset into a <em>pandas</em> dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Cement</th>\n",
       "      <th>Blast Furnace Slag</th>\n",
       "      <th>Fly Ash</th>\n",
       "      <th>Water</th>\n",
       "      <th>Superplasticizer</th>\n",
       "      <th>Coarse Aggregate</th>\n",
       "      <th>Fine Aggregate</th>\n",
       "      <th>Age</th>\n",
       "      <th>Strength</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>540.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>162.0</td>\n",
       "      <td>2.5</td>\n",
       "      <td>1040.0</td>\n",
       "      <td>676.0</td>\n",
       "      <td>28</td>\n",
       "      <td>79.99</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>540.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>162.0</td>\n",
       "      <td>2.5</td>\n",
       "      <td>1055.0</td>\n",
       "      <td>676.0</td>\n",
       "      <td>28</td>\n",
       "      <td>61.89</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>332.5</td>\n",
       "      <td>142.5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>228.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>932.0</td>\n",
       "      <td>594.0</td>\n",
       "      <td>270</td>\n",
       "      <td>40.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>332.5</td>\n",
       "      <td>142.5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>228.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>932.0</td>\n",
       "      <td>594.0</td>\n",
       "      <td>365</td>\n",
       "      <td>41.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>198.6</td>\n",
       "      <td>132.4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>978.4</td>\n",
       "      <td>825.5</td>\n",
       "      <td>360</td>\n",
       "      <td>44.30</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Cement  Blast Furnace Slag  Fly Ash  Water  Superplasticizer  \\\n",
       "0   540.0                 0.0      0.0  162.0               2.5   \n",
       "1   540.0                 0.0      0.0  162.0               2.5   \n",
       "2   332.5               142.5      0.0  228.0               0.0   \n",
       "3   332.5               142.5      0.0  228.0               0.0   \n",
       "4   198.6               132.4      0.0  192.0               0.0   \n",
       "\n",
       "   Coarse Aggregate  Fine Aggregate  Age  Strength  \n",
       "0            1040.0           676.0   28     79.99  \n",
       "1            1055.0           676.0   28     61.89  \n",
       "2             932.0           594.0  270     40.27  \n",
       "3             932.0           594.0  365     41.05  \n",
       "4             978.4           825.5  360     44.30  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "concrete_data = pd.read_csv('concrete_data.csv')\n",
    "concrete_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So the first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let's check how many data points we have."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1030, 9)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "concrete_data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's check the dataset for any missing values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Cement</th>\n",
       "      <th>Blast Furnace Slag</th>\n",
       "      <th>Fly Ash</th>\n",
       "      <th>Water</th>\n",
       "      <th>Superplasticizer</th>\n",
       "      <th>Coarse Aggregate</th>\n",
       "      <th>Fine Aggregate</th>\n",
       "      <th>Age</th>\n",
       "      <th>Strength</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1030.000000</td>\n",
       "      <td>1030.000000</td>\n",
       "      <td>1030.000000</td>\n",
       "      <td>1030.000000</td>\n",
       "      <td>1030.000000</td>\n",
       "      <td>1030.000000</td>\n",
       "      <td>1030.000000</td>\n",
       "      <td>1030.000000</td>\n",
       "      <td>1030.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>281.167864</td>\n",
       "      <td>73.895825</td>\n",
       "      <td>54.188350</td>\n",
       "      <td>181.567282</td>\n",
       "      <td>6.204660</td>\n",
       "      <td>972.918932</td>\n",
       "      <td>773.580485</td>\n",
       "      <td>45.662136</td>\n",
       "      <td>35.817961</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>104.506364</td>\n",
       "      <td>86.279342</td>\n",
       "      <td>63.997004</td>\n",
       "      <td>21.354219</td>\n",
       "      <td>5.973841</td>\n",
       "      <td>77.753954</td>\n",
       "      <td>80.175980</td>\n",
       "      <td>63.169912</td>\n",
       "      <td>16.705742</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>102.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>121.800000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>801.000000</td>\n",
       "      <td>594.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.330000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>192.375000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>164.900000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>932.000000</td>\n",
       "      <td>730.950000</td>\n",
       "      <td>7.000000</td>\n",
       "      <td>23.710000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>272.900000</td>\n",
       "      <td>22.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>185.000000</td>\n",
       "      <td>6.400000</td>\n",
       "      <td>968.000000</td>\n",
       "      <td>779.500000</td>\n",
       "      <td>28.000000</td>\n",
       "      <td>34.445000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>350.000000</td>\n",
       "      <td>142.950000</td>\n",
       "      <td>118.300000</td>\n",
       "      <td>192.000000</td>\n",
       "      <td>10.200000</td>\n",
       "      <td>1029.400000</td>\n",
       "      <td>824.000000</td>\n",
       "      <td>56.000000</td>\n",
       "      <td>46.135000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>540.000000</td>\n",
       "      <td>359.400000</td>\n",
       "      <td>200.100000</td>\n",
       "      <td>247.000000</td>\n",
       "      <td>32.200000</td>\n",
       "      <td>1145.000000</td>\n",
       "      <td>992.600000</td>\n",
       "      <td>365.000000</td>\n",
       "      <td>82.600000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Cement  Blast Furnace Slag      Fly Ash        Water  \\\n",
       "count  1030.000000         1030.000000  1030.000000  1030.000000   \n",
       "mean    281.167864           73.895825    54.188350   181.567282   \n",
       "std     104.506364           86.279342    63.997004    21.354219   \n",
       "min     102.000000            0.000000     0.000000   121.800000   \n",
       "25%     192.375000            0.000000     0.000000   164.900000   \n",
       "50%     272.900000           22.000000     0.000000   185.000000   \n",
       "75%     350.000000          142.950000   118.300000   192.000000   \n",
       "max     540.000000          359.400000   200.100000   247.000000   \n",
       "\n",
       "       Superplasticizer  Coarse Aggregate  Fine Aggregate          Age  \\\n",
       "count       1030.000000       1030.000000     1030.000000  1030.000000   \n",
       "mean           6.204660        972.918932      773.580485    45.662136   \n",
       "std            5.973841         77.753954       80.175980    63.169912   \n",
       "min            0.000000        801.000000      594.000000     1.000000   \n",
       "25%            0.000000        932.000000      730.950000     7.000000   \n",
       "50%            6.400000        968.000000      779.500000    28.000000   \n",
       "75%           10.200000       1029.400000      824.000000    56.000000   \n",
       "max           32.200000       1145.000000      992.600000   365.000000   \n",
       "\n",
       "          Strength  \n",
       "count  1030.000000  \n",
       "mean     35.817961  \n",
       "std      16.705742  \n",
       "min       2.330000  \n",
       "25%      23.710000  \n",
       "50%      34.445000  \n",
       "75%      46.135000  \n",
       "max      82.600000  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "concrete_data.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Cement                0\n",
       "Blast Furnace Slag    0\n",
       "Fly Ash               0\n",
       "Water                 0\n",
       "Superplasticizer      0\n",
       "Coarse Aggregate      0\n",
       "Fine Aggregate        0\n",
       "Age                   0\n",
       "Strength              0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "concrete_data.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data looks very clean and is ready to be used to build our model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Split data into predictors and target"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "concrete_data_columns = concrete_data.columns\n",
    "predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength\n",
    "target = concrete_data['Strength'] # Strength column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's do a quick sanity check of the predictors and the target dataframes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Cement</th>\n",
       "      <th>Blast Furnace Slag</th>\n",
       "      <th>Fly Ash</th>\n",
       "      <th>Water</th>\n",
       "      <th>Superplasticizer</th>\n",
       "      <th>Coarse Aggregate</th>\n",
       "      <th>Fine Aggregate</th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>540.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>162.0</td>\n",
       "      <td>2.5</td>\n",
       "      <td>1040.0</td>\n",
       "      <td>676.0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>540.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>162.0</td>\n",
       "      <td>2.5</td>\n",
       "      <td>1055.0</td>\n",
       "      <td>676.0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>332.5</td>\n",
       "      <td>142.5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>228.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>932.0</td>\n",
       "      <td>594.0</td>\n",
       "      <td>270</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>332.5</td>\n",
       "      <td>142.5</td>\n",
       "      <td>0.0</td>\n",
       "      <td>228.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>932.0</td>\n",
       "      <td>594.0</td>\n",
       "      <td>365</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>198.6</td>\n",
       "      <td>132.4</td>\n",
       "      <td>0.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>978.4</td>\n",
       "      <td>825.5</td>\n",
       "      <td>360</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Cement  Blast Furnace Slag  Fly Ash  Water  Superplasticizer  \\\n",
       "0   540.0                 0.0      0.0  162.0               2.5   \n",
       "1   540.0                 0.0      0.0  162.0               2.5   \n",
       "2   332.5               142.5      0.0  228.0               0.0   \n",
       "3   332.5               142.5      0.0  228.0               0.0   \n",
       "4   198.6               132.4      0.0  192.0               0.0   \n",
       "\n",
       "   Coarse Aggregate  Fine Aggregate  Age  \n",
       "0            1040.0           676.0   28  \n",
       "1            1055.0           676.0   28  \n",
       "2             932.0           594.0  270  \n",
       "3             932.0           594.0  365  \n",
       "4             978.4           825.5  360  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictors.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    79.99\n",
       "1    61.89\n",
       "2    40.27\n",
       "3    41.05\n",
       "4    44.30\n",
       "Name: Strength, dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "target.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Cement</th>\n",
       "      <th>Blast Furnace Slag</th>\n",
       "      <th>Fly Ash</th>\n",
       "      <th>Water</th>\n",
       "      <th>Superplasticizer</th>\n",
       "      <th>Coarse Aggregate</th>\n",
       "      <th>Fine Aggregate</th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.476712</td>\n",
       "      <td>-0.856472</td>\n",
       "      <td>-0.846733</td>\n",
       "      <td>-0.916319</td>\n",
       "      <td>-0.620147</td>\n",
       "      <td>0.862735</td>\n",
       "      <td>-1.217079</td>\n",
       "      <td>-0.279597</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.476712</td>\n",
       "      <td>-0.856472</td>\n",
       "      <td>-0.846733</td>\n",
       "      <td>-0.916319</td>\n",
       "      <td>-0.620147</td>\n",
       "      <td>1.055651</td>\n",
       "      <td>-1.217079</td>\n",
       "      <td>-0.279597</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.491187</td>\n",
       "      <td>0.795140</td>\n",
       "      <td>-0.846733</td>\n",
       "      <td>2.174405</td>\n",
       "      <td>-1.038638</td>\n",
       "      <td>-0.526262</td>\n",
       "      <td>-2.239829</td>\n",
       "      <td>3.551340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.491187</td>\n",
       "      <td>0.795140</td>\n",
       "      <td>-0.846733</td>\n",
       "      <td>2.174405</td>\n",
       "      <td>-1.038638</td>\n",
       "      <td>-0.526262</td>\n",
       "      <td>-2.239829</td>\n",
       "      <td>5.055221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.790075</td>\n",
       "      <td>0.678079</td>\n",
       "      <td>-0.846733</td>\n",
       "      <td>0.488555</td>\n",
       "      <td>-1.038638</td>\n",
       "      <td>0.070492</td>\n",
       "      <td>0.647569</td>\n",
       "      <td>4.976069</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Cement  Blast Furnace Slag   Fly Ash     Water  Superplasticizer  \\\n",
       "0  2.476712           -0.856472 -0.846733 -0.916319         -0.620147   \n",
       "1  2.476712           -0.856472 -0.846733 -0.916319         -0.620147   \n",
       "2  0.491187            0.795140 -0.846733  2.174405         -1.038638   \n",
       "3  0.491187            0.795140 -0.846733  2.174405         -1.038638   \n",
       "4 -0.790075            0.678079 -0.846733  0.488555         -1.038638   \n",
       "\n",
       "   Coarse Aggregate  Fine Aggregate       Age  \n",
       "0          0.862735       -1.217079 -0.279597  \n",
       "1          1.055651       -1.217079 -0.279597  \n",
       "2         -0.526262       -2.239829  3.551340  \n",
       "3         -0.526262       -2.239829  5.055221  \n",
       "4          0.070492        0.647569  4.976069  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictors_norm = (predictors - predictors.mean()) / predictors.std()\n",
    "predictors_norm.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n_cols = predictors_norm.shape[1] # number of predictors\n",
    "n_cols"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"item1\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"item1\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import Keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let's go ahead and import the Keras library"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using TensorFlow backend.\n"
     ]
    }
   ],
   "source": [
    "import keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the TensorFlow backend was used to install the Keras library."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's import the rest of the packages from the Keras library that we will need to build our regressoin model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.models import Sequential\n",
    "from keras.layers import Dense"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define regression model\n",
    "def regression_model():\n",
    "    # create model\n",
    "    model = Sequential()\n",
    "    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))\n",
    "    model.add(Dense(1))\n",
    "    \n",
    "    # compile model\n",
    "    model.compile(optimizer='adam', loss='mean_squared_error')\n",
    "    return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's import scikit-learn in order to randomly split the data into a training and test sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Splitting the data into a training and test sets by holding 30% of the data for testing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train and Test the Network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's call the function now to create our model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# build the model\n",
    "model = regression_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we will train the model for 50 epochs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/50\n",
      " - 0s - loss: 1651.1822\n",
      "Epoch 2/50\n",
      " - 0s - loss: 1631.2427\n",
      "Epoch 3/50\n",
      " - 0s - loss: 1611.8665\n",
      "Epoch 4/50\n",
      " - 0s - loss: 1593.0129\n",
      "Epoch 5/50\n",
      " - 0s - loss: 1574.5028\n",
      "Epoch 6/50\n",
      " - 0s - loss: 1556.1718\n",
      "Epoch 7/50\n",
      " - 0s - loss: 1538.1068\n",
      "Epoch 8/50\n",
      " - 0s - loss: 1520.4636\n",
      "Epoch 9/50\n",
      " - 0s - loss: 1502.4612\n",
      "Epoch 10/50\n",
      " - 0s - loss: 1484.6265\n",
      "Epoch 11/50\n",
      " - 0s - loss: 1466.5279\n",
      "Epoch 12/50\n",
      " - 0s - loss: 1448.3623\n",
      "Epoch 13/50\n",
      " - 0s - loss: 1429.3800\n",
      "Epoch 14/50\n",
      " - 0s - loss: 1410.6920\n",
      "Epoch 15/50\n",
      " - 0s - loss: 1391.1524\n",
      "Epoch 16/50\n",
      " - 0s - loss: 1371.3942\n",
      "Epoch 17/50\n",
      " - 0s - loss: 1351.2343\n",
      "Epoch 18/50\n",
      " - 0s - loss: 1330.5209\n",
      "Epoch 19/50\n",
      " - 0s - loss: 1309.4222\n",
      "Epoch 20/50\n",
      " - 0s - loss: 1287.7469\n",
      "Epoch 21/50\n",
      " - 0s - loss: 1265.5437\n",
      "Epoch 22/50\n",
      " - 0s - loss: 1242.3534\n",
      "Epoch 23/50\n",
      " - 0s - loss: 1219.0512\n",
      "Epoch 24/50\n",
      " - 0s - loss: 1194.1156\n",
      "Epoch 25/50\n",
      " - 0s - loss: 1169.3435\n",
      "Epoch 26/50\n",
      " - 0s - loss: 1143.0799\n",
      "Epoch 27/50\n",
      " - 0s - loss: 1116.2312\n",
      "Epoch 28/50\n",
      " - 0s - loss: 1088.7420\n",
      "Epoch 29/50\n",
      " - 0s - loss: 1060.6200\n",
      "Epoch 30/50\n",
      " - 0s - loss: 1032.4794\n",
      "Epoch 31/50\n",
      " - 0s - loss: 1003.6558\n",
      "Epoch 32/50\n",
      " - 0s - loss: 974.9272\n",
      "Epoch 33/50\n",
      " - 0s - loss: 946.0666\n",
      "Epoch 34/50\n",
      " - 0s - loss: 917.1510\n",
      "Epoch 35/50\n",
      " - 0s - loss: 888.5678\n",
      "Epoch 36/50\n",
      " - 0s - loss: 859.9041\n",
      "Epoch 37/50\n",
      " - 0s - loss: 831.3088\n",
      "Epoch 38/50\n",
      " - 0s - loss: 803.0652\n",
      "Epoch 39/50\n",
      " - 0s - loss: 775.1644\n",
      "Epoch 40/50\n",
      " - 0s - loss: 747.1952\n",
      "Epoch 41/50\n",
      " - 0s - loss: 720.1013\n",
      "Epoch 42/50\n",
      " - 0s - loss: 693.5036\n",
      "Epoch 43/50\n",
      " - 0s - loss: 667.2098\n",
      "Epoch 44/50\n",
      " - 0s - loss: 641.2060\n",
      "Epoch 45/50\n",
      " - 0s - loss: 616.5792\n",
      "Epoch 46/50\n",
      " - 0s - loss: 592.0672\n",
      "Epoch 47/50\n",
      " - 0s - loss: 568.0495\n",
      "Epoch 48/50\n",
      " - 0s - loss: 545.4204\n",
      "Epoch 49/50\n",
      " - 0s - loss: 522.8632\n",
      "Epoch 50/50\n",
      " - 0s - loss: 501.0779\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.callbacks.History at 0x7fbde8453e10>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fit the model\n",
    "epochs = 50\n",
    "model.fit(X_train, y_train, epochs=epochs, verbose=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we need to evaluate the model on the test data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "309/309 [==============================] - 0s 73us/step\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "493.70862813906376"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "loss_val = model.evaluate(X_test, y_test)\n",
    "y_pred = model.predict(X_test)\n",
    "loss_val"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's import the mean_squared_error function from Scikit-learn."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import mean_squared_error"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "493.70863791330663 0.0\n"
     ]
    }
   ],
   "source": [
    "mean_square_error = mean_squared_error(y_test, y_pred)\n",
    "mean = np.mean(mean_square_error)\n",
    "standard_deviation = np.std(mean_square_error)\n",
    "print(mean, standard_deviation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MSE 1: 96.23834398880746\n",
      "MSE 2: 82.52934939190023\n",
      "MSE 3: 50.423970811961155\n",
      "MSE 4: 48.99604903532849\n",
      "MSE 5: 44.70407389668585\n",
      "MSE 6: 45.84412700072847\n",
      "MSE 7: 47.875012147773816\n",
      "MSE 8: 35.477522495109284\n",
      "MSE 9: 35.563788170181816\n",
      "MSE 10: 36.80308609255695\n",
      "MSE 11: 35.19219320105889\n",
      "MSE 12: 34.34737917211835\n",
      "MSE 13: 40.38878035313875\n",
      "MSE 14: 38.32519055956004\n",
      "MSE 15: 32.740622634640786\n",
      "MSE 16: 27.634737857337136\n",
      "MSE 17: 31.34054788879592\n",
      "MSE 18: 32.40728301755047\n",
      "MSE 19: 31.37120697490606\n",
      "MSE 20: 31.427853618239123\n",
      "MSE 21: 29.94852909223933\n",
      "MSE 22: 30.7034236080824\n",
      "MSE 23: 25.380024011852672\n",
      "MSE 24: 28.135550983515373\n",
      "MSE 25: 32.137249283034436\n",
      "MSE 26: 32.515969162234214\n",
      "MSE 27: 26.966679705771043\n",
      "MSE 28: 28.428698246533045\n",
      "MSE 29: 31.233512199426546\n",
      "MSE 30: 29.140160242716473\n",
      "MSE 31: 27.48099442515944\n",
      "MSE 32: 27.894654844956875\n",
      "MSE 33: 25.82127880355687\n",
      "MSE 34: 28.76602467904199\n",
      "MSE 35: 32.21809555103092\n",
      "MSE 36: 32.908837167190505\n",
      "MSE 37: 24.138178942658755\n",
      "MSE 38: 30.938592262638426\n",
      "MSE 39: 29.289363231473757\n",
      "MSE 40: 24.388125768371385\n",
      "MSE 41: 30.36188562633922\n",
      "MSE 42: 25.28865729643689\n",
      "MSE 43: 27.72400673924912\n",
      "MSE 44: 33.43268985192753\n",
      "MSE 45: 30.220784554589528\n",
      "MSE 46: 30.706770295078314\n",
      "MSE 47: 28.18811243791796\n",
      "MSE 48: 30.8071376961026\n",
      "MSE 49: 31.410146176236346\n",
      "MSE 50: 30.94078703451311\n",
      "\n",
      "\n",
      "Below is the mean and standard deviation of 50 mean squared errors with normalized data. Total number of epochs for each training is: 50\n",
      "\n",
      "Mean: 34.742920380277766\n",
      "Standard Deviation: 12.76633575833323\n"
     ]
    }
   ],
   "source": [
    "total_mean_squared_errors = 50\n",
    "epochs = 50\n",
    "mean_squared_errors = []\n",
    "for i in range(0, total_mean_squared_errors):\n",
    "    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)\n",
    "    model.fit(X_train, y_train, epochs=epochs, verbose=0)\n",
    "    MSE = model.evaluate(X_test, y_test, verbose=0)\n",
    "    print(\"MSE \"+str(i+1)+\": \"+str(MSE))\n",
    "    y_pred = model.predict(X_test)\n",
    "    mean_square_error = mean_squared_error(y_test, y_pred)\n",
    "    mean_squared_errors.append(mean_square_error)\n",
    "\n",
    "mean_squared_errors = np.array(mean_squared_errors)\n",
    "mean = np.mean(mean_squared_errors)\n",
    "standard_deviation = np.std(mean_squared_errors)\n",
    "\n",
    "print('\\n')\n",
    "print(\"Below is the mean and standard deviation of \" +str(total_mean_squared_errors) + \" mean squared errors with normalized data. Total number of epochs for each training is: \" +str(epochs) + \"\\n\")\n",
    "print(\"Mean: \"+str(mean))\n",
    "print(\"Standard Deviation: \"+str(standard_deviation))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
